Profitable Scheduling on Multiple Speed-Scalable 
Processors* 



Peter Kling and Peter Pietrzyk 

Heinz Nixdorf Institute and Computer Science Department 
University of Paderborn 

Furstenallee 11, 33102 Paderborn, Germany 



Abstract 



We present a new online algorithm for profit-oriented scheduling on multiple speed-scalable pro- 
cessors. Moreover, we provide a tight analysis of the algorithm's competitiveness. Our results 
generalize and improve upon work by Chan, Lam, and Li 10 , which considers a single speed- 
scalable processor. Using significantly different techniques, we can not only extend their model 
to multiprocessors but also prove an enhanced and tight competitive ratio for our algorithm. 

In our scheduling problem, jobs arrive over time and are preemptable. They have different 
workloads, values, and deadlines. The scheduler may decide not to finish a job but instead to 
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Oh suffer a loss equaling the job's value. However, to process a job's workload until its deadline 

the scheduler must invest a certain amount of energy. The cost of a schedule is the sum of 
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lost values and invested energy. In order to finish a job the scheduler has to determine which 
processors to use and set their speeds accordingly. A processor's energy consumption is power 
P a (s) integrated over time, where P a (s) = s a is the power consumption when running at speed 
s. Since we consider the online variant of the problem, the scheduler has no knowledge about 
future jobs. This problem was introduced by Chan, Lam, and Li 10 for the case of a single 
processor. They presented an online algorithm which is a a + 2eo>competitive. We provide an 
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online algorithm for the case of multiple processors with an improved competitive ratio of a a . 

1 Introduction 

> 

From an economical point of view, the value of energy has increased tremendously during the 
last decades. This applies not only to the energy consumed in small-scale computer systems 
but especially to the energy consumption in large data centers. According to current reports 
(e.g., Barroso and Holzle [6]), the decisive factors regarding the costs of running a data center 
are mostly the cooling process and the actual computations rather than the acquisition of 
the necessary hardware. Thus, in order to maximize their revenue, data centers strive to 
. . minimize the energy consumption while still guaranteeing a sufficiently high quality of service 

to their customers. One way to approach this goal are technical solutions improving the 
^ involved hardware. However, coupling such solutions with canonical or standard algorithms 

wastes much potential. Only by designing sophisticated algorithms can one hope to fully 
exploit their power and possibilities. A prominent example for this is dynamic speed scaling, a 
technology that adapts a processor's speed according to the current workload (Intel SpeedStep 
or AMD PowerNowl). Simply decreasing the speed at times of small load may lower the total 
energy consumption substantially. However, a lower speed often also implies a lower quality 
of service, which in turn may impair the data center's revenue. One needs clever algorithms 
to fully utilize speed scaling and to achieve a provably good or even optimal profit. 
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But how exactly should a data center make use of speed scaling in order to maximize 
profit? On a relatively basic level, one can imagine a data center's situation as follows: Jobs 
of different sizes and values arrive over time at the data center. For finishing a customer's 
job in time, the data center receives a payment corresponding to the job's value. However, to 
finish a job the data center has to invest an amount of energy depending on the job's size and 
potential time constraints. Investing into low- value jobs that require much energy may lower 
the profit. Even processing jobs whose values seem to justify the energy investment may be 
bad, as this may hinder the efficient processing of more lucrative jobs that arrive later. Thus, 
one has to carefully choose not only how and when to process the different jobs but also 
which to process at all. We propose an algorithm that handles this scenario provably well 
and improves upon the former best known result. Moreover, we generalize the model to the 
important case of multiple processors (until now, only a single speed-scalable processor was 
considered). Our analysis is partly based on an intriguing new technique recently suggested 
by Gupta, Krishnaswamy, and Pruhs [12]. We adapt and extend it to suit our problem and 
show its large potential compared to the classical analysis methods prevailing in this area 
(see "Our Contribution" later in this section). 



Related Work. 

There exists plenty of work concerning energy-efficient scheduling strategies in both theoretical 
and practical contexts. Dynamic speed scaling (also referred to as dynamic voltage scaling) 
is one of the most important technical tools to save energy in modern systems. It allows 
the scheduler to dynamically adapt the system's speed to the current workload. A recent 
survey by Albers [I] gives a good and compact overview on the state of the art of algorithmic 
research in this area. In the following, we concentrate on models for speed-scalable processors 
and jobs with deadline constraints. Theoretical work in this area has been initiated by Yao, 
Demers, and Shenker [14] . They considered a single speed-scalable processor that processes 
preempt able jobs which arrive over time and come with different deadlines and workloads. 
Yao, Demers, and Shenker studied the question of how to finish all the jobs in an energy- 



minimal way. In their seminal work 14 , they modeled the power consumption P a (s) of a 
processor running at speed s by a constant degree polynomial P a (s) = s a . Here, the energy 
exponent a is assumed to be a constant a > 2. In classical CMOS-based systems a = 3 
usually yields a suitable approximation of the actual power consumption. Yao, Demers, and 
Shenker developed an optimal offline algorithm, known as YDS, as well as the two online 
algorithms Optimal Available (OA) and Average Rate (AVR). Up to now, OA remains one 
of the most important algorithms in this area, being an essential part of many algorithms for 
both the original problem as well as for its manifold variations. Using a rather complex but 
elegant amortized potential function argument, Bansal, Kimbrel, and Pruhs [3] proved that 
OA is exactly a a - competitive. They also proposed a new algorithm, named BKP, which 
achieves a competitive ratio of essentially 2e a+1 . The algorithm qOA presented by Bansal 
et al. 5 is particularly well suited for low powers of a, where it outperforms both OA and 
BKP. In this work, the authors also proved that no deterministic algorithm can achieve a 
competitive ratio of better than e a ~ 1 / a . In their recent work, Albers, Antoniadis, and Greiner 
[2] presented an optimal offline algorithm for the multiprocessor case. Moreover, using this 
algorithm, they were able to also extend OA to the multiprocessor case and proved the same 
competitive ratio of a a as in the single processor case. 

All results mentioned so far are concerned only with the energy necessary to finish all 
jobs. With respect to the profitability aspect, the two most relevant results for us are due 



to Chan, Lam, and Li 10 and Pruhs and Stein [13] . Both proposed a model incorporating 
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profitability into classical energy-efficient scheduling. In the simplest case, jobs have values 
and the scheduler is no longer required to finish all jobs. Instead, it can decide to not process 
jobs whose values do not justify the foreseeable energy investment necessary to complete 
them. The objective is to maximize the profit [l3] or, similarly, to minimize the loss [To]. As 
argued by the authors, the latter model has the benefit of being a direct generalization of the 
classical model by Yao, Demers, and Shenker. For maximizing the profit, Pruhs and Stein 
[l3| showed that in order to achieve a bounded competitive ratio, resource augmentation 
is necessary and gave a scalable online algorithm. For minimizing the loss, Chan, Lam, 
and Li [To] gave an a a + 2ea-competitive algorithm. Another very important and recent 



work is due to Gupta, Krishnaswamy, and Pruhs 12 and considers the Online Generalized 
Assignment Problem (OnGAP). The authors showed an interesting relation to a multitude 
of problems in the context of speed-scalability (not only for scheduling). They developed a 
convex programming formulation of the problem and applied well-known techniques from 
convex optimization. Especially, they used a greedy primal-dual approach as known from 
linear programming (see, e.g., 9 ). This way, they designed an online algorithm for the 
classical model by Yao, Demers, and Shenker (no job values; one processor) which is very 
similar to OA and proved the exact same competitive ratio of a a . 



Our Contribution. 

We develop and analyze a new online algorithm for scheduling valuable jobs on multiple 
speed-scalable processors. Our algorithm improves upon known results in two respects: For 
the single processor case it improves the best known competitive ratio from a a + 2ea to a a . 
Moreover, this constant competitive ratio holds even for the case of multiple processors. To 
the best of our knowledge, this is the first algorithm that is able to handle the multiprocessor 
case in this scenario. We also show that our analysis is tight in that the proven competitive 
ratio is optimal for our algorithm. 

Our analysis is significantly different from the typical potential function argument which is 
dominant in the analysis of online algorithms in this research area. Instead, we make use of a 
framework recently suggested by Gupta, Krishnaswamy, and Pruhs [l2]. It utilizes well-known 
tools from convex optimization, especially duality theory and primal-dual algorithms. We 
develop a convex programming formulation and design a greedy primal-dual online algorithm 
for the problem at hand. Compared to the original framework, we have to overcome the 
additional issue of integral variables in our convex program that are caused by the new 
profitability aspect. Moreover, the handling of multiple processors proves to be a challenging 
task. It not only causes a much more complex objective function in the convex program but 
also makes it harder to grasp the structural properties of the resulting schedule. Our result 
shows that this technique is not only suitable for the classical energy-efficient scheduling 
model but also for more complex variations of it, as conjectured by Gupta, Krishnaswamy, and 
Pruhs. It is interesting to note that, in terms of the analysis, this approach goes back to the 
roots of Yao, Demers, and Shenker's model, as the optimality proof of the YDS algorithm [I] 
is based on a similar convex programming formulation and the well-known KKT conditions 
from convex optimization |8 . Our algorithm can be seen as greedily increasing the convex 
program's variables while maintaining a relaxed version of these KKT conditions. 



2 Model & Preliminaries 

We consider a system of m speed-scalable processors. That is, each processor can be set to 
any speed s G M>o (independently from the others). When running at speed s, the power 
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consumption of a single processor is given by the power function FJ*(s) = s a . Here, the 
constant parameter a G R>i is called the energy exponent. A problem instance consists of a 
set J = { 1, 2, . . . , n } of n jobs. Each job j G J is associated with a release time rj, a deadline 
dj, a workload Wj, and a mfaze fj. A schedule S describes if and how the different jobs are 
processed by the system. It consists of m speed functions Si : M>o —> ^>o £ { 1 5 2, . . . , m }) 
and a job assignment policy. The speed function Si dictates the speed Si(t) of the i-th 
processor at time t. The job assignment policy decides which jobs to run on the processors. 
At any time t, it may schedule at most one job per processor, and each job can be processed 
by at most one processor at any given time (i.e., we consider nonparallel jobs). Moreover, 
jobs are preemptive: a running job may be interrupted at any time and continued later on, 
possibly on a different processor. The total work processed by processor i between time t\ 
and ti is J^ 2 Si(t) dt. Similarly, the overall power consumed by this processor during the 
same time is J^ 2 P a (Si(t)) dt. Let Sj(t) denote the speed used to process job j at time t. We 
say job j is finished under schedule S if 5 processes (at least) Wj units of j's work during the 
interval [rj,dj). That is, if we have / 3 Sj(i) dt > 

A given schedule S may not finish all n jobs. In this case, the total value of unfinished 
jobs is considered as a loss. Thus, the cost of S is defined as the sum of the total energy 
consumption and the total value of unfinished jobs. More formally, if J re j denotes the set of 
unfinished (aka rejected) jobs under schedule S, we define the cost of schedule S by 

771 POG 

cost(S) -=E / P«(£(*))df + 5>;. (1) 

Our goal is to construct a low-cost schedule in the online scenario of the problem. That is, 
the job set J is not known a priori, but rather revealed over time. Especially, we do not 
know the total number of jobs, and the existence as well as the attributes of a job j G J are 
revealed just when the job is released at time rj. We measure the quality of algorithms for 
this online problem by their competitive ratio: Given an online algorithm A, let A(J) denote 
the resulting schedule for job set J. The competitive ratio of A is defined as 

costM(J)) 

T co S t(QPT(V (2) 

where OPT( J) denotes an optimal schedule for the job set J. Note that, by definition, the 
competitive ratio is at least one. 

2.1 Convex Programming Formulation 

In the following, we develop a convex programming formulation of the above (offline) 
scheduling problem to aid us in the design and analysis of our online algorithm (cf. Section |3|. 
Following an idea by Bingham and Greenstreet [7], we partition time into atomic intervals 
using the jobs' release times and deadlines. The goal of our convex program is to compute 
what portion of each job to process during the different atomic intervals in an optimal 
schedule. Once we have such a fixed work assignment, we use a deterministic algorithm by 
Chen et al. [Tl] to efficiently compute an energy-minimal way to process the corresponding 
work on the m processors in this interval. The energy consumption of the resulting schedule in 
the interval can be written as a convex function Vk of the work assignment. This function 
plays a crucial role in the optimization objective of our convex program, and studying its 
properties and the corresponding schedule's structure is an important part of our analysis. 



We will elaborate on Vk once we have derived the convex program (see Section 2.2). 
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Figure 1 Mathematical programming formulation (IMP) of our scheduling problem. 

For a given job set J, let us partition the time horizon into N G N atomic intervals Tk 
(fc G { 1, 2, . . . , TV }) as follows. We define Tk := [rk-i, Tk) where ro < t\ < . . . < tn are 
chosen such that { To, n, . . . , m } = { rj, dj \ j G J } . Let Ik '■= Tk — Tk-i denote the length 
of interval T^. Note that there are at most 2n — 1 intervals. To model the deadline constraint 
of job j, we introduce parameters Cjk G { 0, 1 } that indicate whether Tk C [r^, dj) (cjfc = 1) 
or not (cjk = 0). Our program uses two types of variables: load variables Xjk G [0, 1] for each 
job j G J and each atomic interval k G { 1, 2, . . . , iV }, and indicator variables y 3 ; e { 0, 1 } 
for each job j G J. The variable Xjk indicates what portion of j's workload is assigned to 
interval Tk and the variable yj indicates whether job j is finished (yj = 1) or not (yj = 0). 
Figure [l] shows the complete (integral) mathematical program (IMP) for our scheduling 
problem. The first summand in the objective corresponds to the energy spent in the different 
intervals. The second summand charges costs for all unfinished jobs. The set of constraints 
ensures that a job can be declared as finished only if it has been completely assigned to 
intervals Tk lying in its release-deadline interval [rj^dj). We use x and y to refer to the full 
vectors of variables Xjk and y^ and we use the symbol for element-wise comparison. 

If we relax the domain of (IMP) such that -< y ^ 1, we get a convex program. We refer 
to this convex program as (CP). By introducing dual variables Xj (also called Lagrange 
multipliers) for each constraint of (CP) we can write its Lagrangian L(x,y,X) as 



N / N \ 

5 %2k i • • • 5 %nk 

k=i jeJ jeJ V k=i J 



(3) 



It is a linear combination of the convex program's objective and constraints. Instead of 
prohibiting infeasible solutions (as done by the convex program), it charges a penalty for 
violated constraints (assuming positive Xj). Now, the dual function of (CP) is defined as 

g(X):=miL(x,y,X). (4) 

An important property of the dual function g is that for any A >z 0, the value g(X) is a lower 
bound on the optimal value of (CP). Moreover, since (CP) is a relaxation of (IMP), g(X) is 
also a lower bound on the optimal value of (IMP). See the book by Boyd and Vandenberghe 
[8] for further details on these and similar known facts about (convex) optimization problems. 



2.2 Power Consumption in Atomic Intervals 

Let us give a more detailed description of the function Vk(xik, %2k, • • • , Xnk)- We defined Vk 
implicitly by mapping a given work assignment x±k, ^2fc, • • • , x n k for interval Tk to the power 
consumption of Chen et al.'s algorithm 11 during Tk- This guarantees an energy-minimal 
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(a) Before the arrival of a new job. 
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(b) After the arrival of a new job. 



Figure 2 Schedules computed by Chen et al.'s algorithm before and after the arrival of a new job. 



schedule for the given work assignment. In the following, we give a concise description of 
this algorithm and derive a more explicit formulation as well as some properties of Vk- 

To ease the discussion, let us assume that the jobs are numbered such that xikWi > 
X2kW>2 > • • • > x n kW n . In a nutshell, Chen et al.'s algorithm can be described as follows. 
Define the job set 



jjGJ J < m A Xjk > A XjkWj > — - — : > 



(5) 



These jobs are called dedicated jobs and are scheduled on their own dedicated processor using 
the energy-optimal (since minimal) speed Sjk '-= Xj * Wj . All remaining jobs, called pool jobs, 
are scheduled on the remaining (pool) processors in a greedy manner. The intuition is that 
dedicated jobs are larger than the remaining average workload and thus must be processed on 
a dedicated processor. See u\ Section 3.1] for a relatively short but more detailed description 
of the algorithm. Figure [2] illustrates the resulting schedule and how it may change due to 
the arrival of a new job. Using the above definition of dedicated jobs we can write Vk as 



(6) 



The following proposition gathers some important properties concerning the power consump- 
tion function Vk of an atomic interval Tk- 

► Proposition 1. Consider an arbitrary atomic interval Tk together with its power consump- 
tion function Vk - ^>o ~~ This function has the following properties: 

(a) It is convex and Vk{0) = 0. 

(b) It is differentiable with partial derivatives f^O^ife, • • • , x nk) = w j • P« (sjfc)- Here, Sjk 
denotes the speed used to schedule the workload XjkWj in Chen et al. J s algorithm: 

{ x jkWj/i k ; if j i s a dedicated job 

XjkWj (7) 
(^Wm, , if 3 ^ a pool job. 

Proof Sketch, (a) The equality Vk(0) = is obvious from the definition of Vk- The 
convexity follows easily from V7\ Lemma 3.2]. There, the authors proved the convexity of 
(xik, • • • , x nk ) V k (x lk /w u . . . , x nk /w n ) (a linear transformation of Vk)- 
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(b) Differentiability is obvious for all points (#ifc, . . . , x n k) for which all the inequalities 
XjkWj > ^2 f>j x j'kWj'/(m - j) in Equation ( |2.2| ) are strict: For these, we have a small interval 
around Xjk such that the set i/j(k) of dedicated jobs does not change. On these intervals, Vk 
is essentially a linear map of the different iable function *P a (s) = s a . For other points, one can 
compute the left and right derivatives in Xjk, distinguishing whether job j switched between 
a dedicated processor and a pool processor, whether j stays on a dedicated processor, or 
whether j stays on a pool processor and some other jobs switch between processor types. All 
cases yield the same left and right derivatives as given in the statement. <4 

We will also need to compare the result of Chen et al.'s algorithm before and after the 
arrival of a new job (cf. Figure [2J. That is, how can the workloads on the processors change 
when a single entry of the work assignment changes from zero to some positive value? 

► Proposition 2. Consider Chen et al. ; s algorithm called for some interval with the two 
work assignments x = (xi, X2, . . . , x n: 0) and x' = (x±,X2, • • • , x n , z) (i.e., before and after 
the arrival of a new job). Let Li and L\ denote the total workload on the i-th fastest processor 
in the resulting schedules, respectively. Then, we have < L\ — Li < z. 

Proof Sketch. We consider only the normalized case. That is, the case of unit workloads 
(wj = 1 for all jobs) and an atomic interval of unit length (Ik = 1). The general case 
follows by a straightforward adaption. Without loss of generality, we furthermore assume 
x\ > X2 > ■ • • > x n . Note that we do not presume any relation between the newly arrived 
workload z and the remaining workloads. Let S and S' be the schedules produced by Chen 
et al.'s algorithm for the work assignments x and x' \ respectively. Similarly, we use d and 
d' to denote the number of dedicated processors, and L poo \ and L' pool for the workload of a 
pool processor in S and S", respectively. Remember that pool processors have the smallest 
workload. That is, we have Li > L poo \ and L\ > L' pool for all i G { 1, 2, . . . , m }. 

We start with the proof of L\ — Li > 0. Observe that the arrival of the workload z will not 
cause any of the former pool jobs to become a dedicated job (cf.. Equation ([5])). Moreover, 
by the same equation, for each dedicated processor that becomes a pool processor we also get 
a new pool job that has a workload of at least L poo i- Thus, the workload of pool processors 
from S can only increase. The workload of the i-th fastest dedicated processor in S is exactly 
X{. If it becomes a pool processor, we have X{ < L' pool = L'^ yielding Li = xi < L\. If it stays 
a dedicated processor, its workload is the i-th largest value in { xi, . . . , x n , z } and, thus, at 
least as large as the i-th largest value in { x±, . . . , x n }, yielding Li < L\. To prove the second 
statement, L\ — Li < z, let us assume L\ — Li > z and seek a contradiction. We distinguish 
two cases, depending on the type (pool or dedicated) of the i-th fastest processor in S f : 

processor i is a pool processor in S' Note that z < L\ — Li < L\ and i being a pool 
processor implies that z is also scheduled on a pool processor (cf.. Equation (J5|). As 
d! is the number of dedicated processors, we must have i > d! . Moreover, all the jobs 
with workload less than L' d , must be pool jobs in S' . These are exactly the jobs which 
are scheduled on the processors d! + 1, . . . , m in schedule S. Thus, the total workload of 
all pool processors in S' equals (m — d!}L\ — z + ^j >d , Lj. Using i > d! , L' it — Li> > 
for dl\ i' G { 1, 2, . . . , m }, and that all pool processors in S f have the same workload, we 
get z = (m- d!)L\ - £ i>d , Lj = £ i>d ,(L< - Lj) = - Lj) >L\- L t . This 

contradicts our assumption. 

processor i is a dedicated processor in S' Our assumption implies L\ > Li + z > z. To- 
gether with i being a dedicated processor this yields L\ = Xi (because Xi remains the 
i-th largest value in { X2, . . . , x n , z }). But the assumption also implies L • > Li + z > 
Li > Xi. We get the contradiction xi = L\ > x\. < 
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1 {executed each time a new job j G J arrives} 

2 init Xjk, Vj, and Xj with zero for all k G { 1, 2, . . . , N } 

3 compute Ajfc := S§^p^(xik,X2k, ■ • • , £jfc, 0, . . . , 0) for each interval Xfc C [r^, dj) 

4 ' Xj ' fe 

5 let the set T m in contain all Tk with minimal Xjk 

6 for each Tk G 7^nm in parallel: 

7 increase Xjk in a continuous way (which in turn raises Xjk according to line 3) 

8 ensure that all Xjk of intervals in Tmin remain equal 

9 update 7mm whenever the Xjk reach a Xjk' with Ty ^ 7mm 

10 stop increasing once one of the following comes true 

11 (a) ^2 Xjk = 1 : set yj := 1, Xj := X jk 

12 (b) Xjk = Vj : reset Xjk '= 0, Xj := Xjk 

Listing 1 Primal-Dual Algorithm PS with parameter S. 



3 An Online Greedy Primal-Dual Algorithm 

The goal of this section is to use the convex programming formulation (CP) and its dual 
function g : R n — >> R to derive a provably good online algorithm for our scheduling problem. 
We start by describing an algorithm that computes a solution to (CP) in an online fashion, 
but knowing the time partitioning Tk (k G { 1, 2, . . . , n }). Subsequently, we explain how this 
solution is used to compute the actual schedule and how we handle the fact that the actual 
atomic intervals are not known beforehand. To solve (CP), we use a greedy primal-dual 



approach for convex programs as suggested by Gupta, Krishnaswamy, and Pruhs 12 . Our 
algorithm extends their framework to the multiprocessor case and to profitable scheduling 
models. It shows how to incorporate rejection policies into the framework (handling the 
integral constraints in the convex program) and how to cope with more complex power 
functions of a system (in our case Vk)- 



The Primal-Dual Algorithm. 

Our primal-dual algorithm, in the following referred to as PD, maintains a set of primal 
variables (x, y) and a set of dual variables A, all initialized with zero. Whenever a new job (i.e., 
a constraint in (CP)) arrives, we start to increase the primal variables Xjk (k G { 1, 2, . . . , N }) 
in a greedy fashion until either the full job is scheduled (i.e., Xjk = 1) or the planned 
energy investment for job j becomes too large compared to its value. In the latter case, 
the variables Xjk are reset to zero, Xj is set to Vj, and yj remains zero (the job is rejected). 
Otherwise, we set yj to one (the job is finished) and Xj to essentially the current rate of cost 
increase per job workload. When greedily increasing the primal variables, we assign the next 
infinitesimal small portion of job j to those atomic intervals that cause the smallest increase 
in costs. Essentially, these are the intervals where j's workload would be scheduled with the 
slowest speed. See Listing [l] for the algorithm. 

The described algorithm is similar to primal-dual algorithms known from linear pro- 
gramming, where primal and/or dual variables are raised at certain rates until the (relaxed) 
complementary slackness conditions are met. In fact, this algorithm is derived by using 
relaxed versions of the Karush-Kuhn- Tucker (KKT) conditions, essentially a generalization 
of the complementary slackness conditions for convex (or even general nonlinear) programs. 
The actual schedule used is the one computed by Chen et al.'s algorithm when applied to 
the current work assignment given by the primal variables Xjk for the atomic interval 
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(a) PD Schedule (b) OA Schedule 



Figure 3 The dashed lines indicate atomic intervals; the bars below, the jobs' availability. Note 
that PD's schedule is more conservative in comparison, leaving more room for scheduling jobs that 
might occur during the last atomic interval. 

Concerning the Time Partitioning. 

Our algorithm formulation assumes a priori knowledge of the atomic intervals However, 
since the jobs arrive in an online fashion, the exact partitioning is actually not known to 
the algorithm. One can reformulate the algorithm such that it uses the intervals T' k induced 
by the jobs J 1 = { 1, 2, . . . , j } C J it knows so far. If a refinement of an atomic interval 
T' k = Tfe 1 U Xfc 2 occurs due to the arrival of a new job, the already assigned job portions 
are simply split according to the ratios l T fcil/|T^| and \ T kA/\T' k \. This reformulated algorithm 
produces an identical schedule. To see this, note that the algorithm with a priori knowledge 
of the refinement T' k = U T^ 2 treats both intervals T/ Cl and T^ 2 as identical (with respect 
to their relative size l T fcJ/|T^|) up to the point when the job causing the refinement arrives. 

Relation to the OA Algorithm. 

For the case of a single processor and sufficiently high job values, algorithm PD is quite 
similar to the popular OA algorithm by Yao, Demers, and Shenker |14 . When a new job 
arrives, PD essentially finds the atomic intervals of lowest speed and increases their speed 
to free computational resources to be used for the new job. This is also true for the OA 
algorithm. However, while PD never changes how other jobs are distributed over atomic 
intervals, OA may actually influence this distribution. Figure [3] gives a simple example for 
the structural difference of the resulting schedules. Another interesting observation is that, 
in the single processor case, our analysis yields the very same optimal rejection policy as 
an OA-based algorithm by Chan, Lam, and Li 10 . Indeed, as we will see in Section [4j our 
analysis yields that 5 = a 1_a is the optimal choice for the parameter S. Using this parameter, 
one can easily check that our rejection policy essentially states to reject a job if its energy 
consumption in the planned schedule exceeds a a ~ 2 • Vj. Or, equivalently, a job is rejected if 
its speed in the planned schedule exceeds • ( v /w) a ~ 1 , the rejection policy from [10] . 

4 Analysis 

In the following, let (x, y) and A denote the primal and dual variables computed by our 
algorithm PD. Remember that the final schedule computed by PD is derived by applying 
Chen et al.'s algorithm to the xi/e, . . . , x n k values in each atomic interval T^. We refer to 
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this schedule as the (x ^ y) -schedule or simply as the schedule PD. Our goal is to use g(X) to 
bound the cost of this schedule (referred to as cost(PD)). Our main result is 

► Theorem 3. The competitive ratio of algorithm PD with the parameter S set to -^hr is 
at most a a . Moreover, there is a problem instance for which PD is exactly by a factor of a a 
worse than an optimal algorithm. That is, our upper bound is optimal. 

For the upper bound, we show that cost(PD) < a a g(X). Since, by duality, g(X) is 
also a lower bound on the optimal value of (CP) and, thereby, on the optimal value of 
(IMP), we get C ost (opt) — ^ ' lower bound follows from a known result for traditional 
energy- efficient scheduling (without job values but the necessity to finish all jobs) by setting 
the job values sufficiently high. 

In the remainder, we develop the key ingredients for the proof of Theorem [3j We start 



in Section 4.1 and derive a more explicit formulation of the dual function value g(X) by 



relating it to a certain (infeasible) solution to our convex program (CP) and a corresponding 



schedule. Section 4.2 further simplifies this formulation by expressing g(X) solely in terms of 
the jobs (instead of their workloads in different atomic intervals). Based on this job-centric 
formulation, Section |4~3] develops different bounds for the dual function value depending on 
certain job characteristics. The actual proof of Theorem [3] combines these bounds and can 
be found in Section l4~4l 



4.1 Structure of an Optimal Infeasible Solution 

First of all, note that the value g{X) = inf L(x, y, A) (cf. Equation Q) is finite and obtained 
by a pair (£, y) of primal variables. These primal variables can be interpreted as a (possibly 
infeasible) solution to the convex program (CP). Moreover, for our fixed dual variable A, 
this solution is optimal in that it minimizes the sum of the objective cost and the penalty for 
violated constraints. In this sense, we refer to (x , y) as an optimal infeasible solution. Our 
goal is to understand the structure of this solution, which will eventually allow us to write 
g(X) in a more explicit way. The results of this subsection are related to results from [l2] , 
but more involved due to the more complex nature of our objective function. 

Note that x and y may differ largely from x and y. However, the following lemmas show a 
strong correlation between this optimal infeasible solution and the feasible (partially integral) 
solution computed by algorithm PD. 

► Lemma 4. Consider an optimal infeasible solution (x,y). Without loss of generality, we 
can assume that it has the following properties: 

(a) y = y 

(b) For any atomic interval T^, there are at most m different jobs j with Xjk > 0. 

Proof, (a) Consider an arbitrary job j G J and remember that the domain for the variables 
yj is restricted to [0,1]. The contribution of variable yj to g(X) = L(x,y,X) is exactly 
yj(Xj — Vj), as can be seen by considering Equation (|3|. If Aj < i?j, this is minimized by 
choosing yj maximal (yj = 1). Otherwise, we must have Aj = Vj (by the definition of 
algorithm PD). This allows us to choose yj arbitrarily, such that we can set it to zero. Both 
choices correspond exactly to the way yj is set by algorithm PD. 

(b) Assume there are more than m jobs with Xjk > 0. We can assume Cjk = 1 for these 
jobs, because otherwise we could set Xjk = without increasing g(X) = L(x,y,X). Now, the 
values xik, • • • , x n k correspond to a work assignment for the atomic interval as used by 
Chen et al.'s algorithm (cf. Section 2.2). By Equation ([3|, the contribution of these values 
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11 



to g(X) = L(x, y, A) is given by Vk{x\k, • • ? #nfc) — ^2jeJ ^j%jk- Since there are more than 
m jobs j with nonzero Xjfe, at least two of them must share a processor in the schedule 
computed by Chen et al.'s algorithm for this work assignment. In other words, there are two 
pool jobs j, j' G J \ ^(fc) with Xjk,Xj'k > 0. Together with Equation (|6|, we see that the 
contribution of Xjk and xy^ to g(X) consists of two terms: a convex term 



(m-\1>(k)\)l k P a 



(m-\^(k)\)l k 



and a linear term —XjXjk — \j'Xj>k- By changing xjk and xy^ along the line that keeps the 
sum XjkWj + Xj>kV0j> constant, we can decrease one of the variables (say Xjk) and increase 
the other such that the first (convex) term remains constant and the second (linear) term is 
not increased. This will not effect the type (dedicated or pool) of other jobs. The only job 
that may change its type is job j', as it may become a dedicated job. Once this happens, 
we iterate the process with two other pool jobs. As the number of dedicated jobs is upper 
bounded by m, this can happen only finitely often. Thus, at some point we can decrease Xjk 
all the way to zero without increasing the dual function value g(X). We continue eliminating 
Xjk variables until at most m of them are nonzero. A 

Given an atomic interval we call the jobs j with Xjk > the contributing jobs of Tk and 
denote the corresponding job set by ip(k). As done in the proof of Lemma [I] we can consider 
x as a work assignment for the atomic intervals T^. By applying Chen et al.'s algorithm, we 
get a schedule whose energy cost in interval is exactly Vk(xik, • • • , &nk)- We refer to this 
schedule as the (x, ^)-schedule. Using this terminology, the second statement of Lemma [4] 
essentially says that in this schedule at most m jobs are scheduled in any atomic interval 
TV Moreover, it follows immediately from the description of Chen et al.'s algorithm that all 
contributing jobs are dedicated jobs of the corresponding atomic interval. 

We can derive a slightly more explicit characterization of the contributing jobs ip(k) of 
an atomic interval by exploiting that (£, y) is a minimizer of (x, y) \-> L(x, y, A). 

► Lemma 5. Consider any atomic interval and its contributing jobs <p(k). Define the 
value Sj := fij/awj)* 111 for any job j. 

(a) For any j G tp(k) we have Xjk = ^Sj = ^- (^j/aw j ) a - 1 . Moreover, j is scheduled at 
constant speed Sj in the (x^y)- schedule. 

(b) The total contribution of the Xjk variables to the dual function value g(X) is 

(1 - a)l h (^p) a = (1 - «)lk E *?■ (8) 

(c) Let nk denote the number of jobs available in the atomic interval (i.e., jobs with 
Cjk = 1). The contributing jobs <p(k) are the min(m,n/ c ) jobs with maximal §j -values 
under all available jobs. 

Proof, (a) By definition, x is a minimizer of x \-> L(x,y,X). This implies that we must have 
^r^(A, x, y) = for any contributing job j G <p(k). We get 

jk 

= = §^r(xik:-.-,Xnk) ~ 

jk jk 

, / XjkWj\ r (XjkWj\ a ~ 1 r 
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which yields the first statement by rearranging. The second statement follows from this by 
noticing that Xj \™ j is the speed used by Chen et al.'s algorithm for the (dedicated) job j. 

(b) By definition of g(\) = L(x, y, A), we get that the total contribution of the Xj k variables 
is (there are no pool jobs!) 

= h^2 ^(sj) - ah ^2 ^7^' = ( 1 ~ a ) lk *i ■ 

je<p(k) je<p(k) 3 je<p(k) 

(c) The contributing jobs must be chosen such that their contribution is minimized. Using 
statement |(b)| and a > 1, we see that this is the case when choosing the maximal number of 
available jobs (at most m) with the largest Sj -values. A 



4.2 A Job-centric Formulation of the Dual Function 

In the following, we assume that the optimal infeasible solution (x,y) adheres to Lemma [4j 
That is, we have y = y and we can relate the optimal infeasible solution to the (x, ^)-schedule 
which schedules in each atomic interval T k exactly the \<p(k)\ (< ra) available jobs with the 
largest §j = (h/awj)^ 1 -values, each on its own dedicated processor at speed §j. We use the 
somewhat lax notation k £ to refer to the atomic intervals T k to which j contributes. 

Our main goal in this section is to derive a formulation of the dual function value solely in 
terms of the jobs. We will also define and discuss the trace of a job, which helps to relate 
any job (even if unfinished) to a certain amount of energy consumed by our PD algorithm. 

Given a job j £ J, let := ^Z keip -i^ h denote the total time it is scheduled in the 
(£, ^)-schedule. Moreover, let E^(j) denote the total energy invested by the (£, 7/)-schedule 
into job j. Now, we can formulate the following lemma. 

► Lemma 6. For any job j £ J, the total energy invested by the optimal infeasible solution 
into job j is E^(j) = l(j)s^. Moreover, the dual function value g(X) can be written as 

2(A) = (i-coE^(i)+Ev ( 9 ) 

jeJ jeJ 

Proof. The equality E^(j) = l(j)s^ follows immediately from the above definitions, as j is 
processed by the (x , ^)-schedule at constant speed Sj for a total time of exactly l(j). For the 
lemma's main statement, remember that yj = if and only if Xj = vj. Otherwise we have 
yj = 1. Thus, the contribution of yj to g(X) is exactly (l-yj)vj +Xjyj = Xj. As we have seen 
in Lemma [5] for a fixed fc, the contribution of all Xj k to g(X) is exactly (1 — a)h 
Summing over all /c, we get that the total contribution of the x- variables equals 

£(1 - a)I fc = a) £; J2 W? = (l-a)X; £ 

k=i je<p(k) k=ije<p(k) jeJ ke^U) 

= (l-a)^Z(j>7 = (l-a)^^(j). 

jeJ jeJ 



Summing up the contributions of the x- and ^-variables we get the desired statement. < 
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Tracing a Job. 

Given a job j, we define its trace as a set of tuples (Tk,i) with k G {1,2,..., TV} and 
i G { 1, 2, . . . , m }. That is, a set of atomic intervals, each coupled with a certain processor. 
Our goal is to choose these such that we can account the energy E^(j) used in the optimal 
infeasible solution on job j to the energy used by algorithm PD during j's trace (on the 
coupled processors). For the formal definition, let us first partition the contributing jobs cp(k) 
of an interval into the subset (fi(k) := { j G (p(k) \ yj = 1 } of jobs finished by PD and 
the subset tp2(k) := { j G tp(k) \ yj = } of jobs unfinished by PD. Now, for any job j G J 
we define its trace Tr (j) as follows: 

Case yj = 1: i) G Tr (j) Sj is the i-th largest valu^jin { Sj> \ j' G (fi(k) } 

Case yj = 0: (X^, |<^i(fc)| + i) G Tr (j) 4=^ Sj is the i-th largest value in { Sj> \ j' G ^(k) } 

That is, jobs that are finished by PD are mapped to the fastest processors in each atomic 
interval for which they are contributing jobs, in decreasing order of their Sj-values. Jobs 
contributing to but which are unfinished by PD are mapped to the remaining processors 
(the exact order is not important in this case). Note that by this mapping, all traces Tr (j) 
are pairwise disjoint. We use the notation Epv(j) to refer to the power consumption of PD 
during j's trace. That is, the power consumption on the i-th. fastest processor in the atomic 
interval for any (XJ^z) G Tr (j). We use Ep^ to denote the total power consumption of 
PD. Since the job traces are pairwise disjoint, we obviously have Epp> > ^Zj e j Epp>(j). 

The following proposition formulates an important structural property of a job's trace. It 
gives us different lower bounds on the speed used by PD during a job's trace, depending 
on whether it is finished or not. To this end, let §j denote the speed PD planned to use for 
job j just before Xj got fixed (i.e., just before PD decides whether to finish j or not). If j is 
finished, we have (cf. algorithm description and Proposition [TJ 

h = S§^(xik,...,Xjk,0,...,0) = 5w j P<Z(s j ). (10) 

jk 

Solving this for §j yields Sj = (^j/sawj) 1 ^ ~ 1 = Similarly, we also get Sj = 

g- 1 /™ - i^. f or unfinished jobs. We use xj = X^jfc < 1 to denote the corresponding portions 
of the unfinished job j planned to be scheduled by PD just before j was rejected. 

► Proposition 7. Consider (Tfe,i) G Tr (j) for a job j G J. Let s(i,k) denote the speed of 
the i-th fastest processor during in the final schedule computed by PD. Then: 

(a) If j is finished by PD ; then s(i, k) > Sj 



(b) If j is not finished by PD ; then s(i,k) > Sj — XjkW3 



D 3' 

> °3 h 

Proof, (a) Remember that Sj = 1 sj. Because of this relation and the definition of 

(Tfc, i) G Tr (j), we must have that sj is the i-th largest value in { Sj> \ j' G ^i(fc) }• Together 
with Lemma I [c) , we even have that Sj is the i-th largest value under all available jobs 
finished by PD. At the time t^-i (the start of interval Tjfe), all these available jobs j' have 
arrived. We consider two cases: If j is a dedicated job at this time, it is scheduled with a 
speed of exactly sj. Moreover, all the i — 1 available jobs j' with Sj> > Sj are dedicated jobs 
and are scheduled with a speed of Sj/, respectively. Thus, j is scheduled on the i-th fastest 
processor, yielding s(i, k) > sj. If j is a pool job at this time, it is scheduled on one of the 
pool processors at a speed of at least §j. But then, since pool processors are the slowest 
processors, the i-th fastest processor must also run at a speed of at least sj. 



1 Ties are resolved arbitrarily but consistently. 
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(b) Remember that Xjk denotes the portion of job j PD planned to schedule in just before 
j got rejected. If j was planned as a dedicated job, we have l^Sj = XjkWj. This trivially 
yields the desired statement because of s(i, k) > 0. If j was not planned as a dedicated job, it 
was to be processed on a pool processor. Let L{i,k) denote the workload on the i-th fastest 
processor during just after j was rejected (i.e., without Xj^Wj). Similarly, let L f (i,k) 
denote the workload on the i-th fastest processor during just before j was rejected (i.e., 
including Xj^wj). Proposition [2] gives us L'(i,k) — L(i,k) < XjkWj. Moreover, since j was 
planned as a pool job (which run at minimal speed), we must have Ij-Sj < L f (i, k). Combining 
these inequalities yields that the speed L ( i ' /c )/^ fc on the i-th fastest processor during at j's 
arrival was at least Sj — Xj \™ 3 ■ As Proposition [2] also implies that the workload (and, thus, 
the speed) of the i-th fastest processor in an atomic interval can only increase due to the 
arrival of new jobs, we get the desired statement. < 



4.3 Balancing the Different Cost Components 

As our goal is to lower-bound the dual function value g(X) = (1 — a) ^E^(j) + ^2Xj by 
the cost of algorithm PD, we have to relate the values Ey(J) and Xj to the energy- and 
value- costs of PD. It depends on the job itself how this is done exactly. For example, in 
the case of finished jobs, both terms can be related to the actual energy consumption of PD 
in a relatively straightforward way. This becomes much harder if the job is not finished by 
PD: after all, in this case PD does not invest any energy into the job. The job's trace plays 
a crucial role in this case, as it allows us to account the energy investment of the optimal 
infeasible solution to the energy PD consumed during the trace. The next proposition gathers 
the most important relations to be used in the following proofs. 

► Proposition 8. Consider an arbitrary job j G J: 

(a) E~ x (j) = X j d ± 

(b) If j is finished by PD ; then E^(j) < S^- 1 E PD (j). 

(c) If j is not finished by PD and Xj > , then 

E X (j) <5^(l- S -^pj EpbU)- (11) 

Proof, (a) We use the identities §j = (h/aw j )°^ T and l(j)sj = XjWj (cf. Lemma and 
compute 



a 



(b) Assume j is finished by PD. Remember that Sj denotes the speed assigne d to j when it 
arrived and Xj got fixed. We have the relation Sj = ~ 1 §j (cf. Section 4.2). Let s m i n 



denote the minimal speed of j's trace in the final (x, §)-schedule produced by PD. That 
is, there is a tuple i) G Tr (j) such that the i-th fastest processor in runs at speed 
Smin and Epv(j) > i(j) s min- By Proposition [7] we must have s m i n > Sj. We compute 

E- X (j) = Km = s^Km < s^ms^n < s^emj)- 

(c) Applying Proposition [7] to all (Tk,i) G Tr (j) yields that the total workload L that is 
processed by PD during j's trace is at least l(j)sj — XjWj > — Wj. The minimum 
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energy necessary to process this workload in time units is ( L /i(j)) a . We compute 



E PD (j)>l(j) jr- >l(j) J =IU)5? 1 




= S~^E~ x (j) (l- 

Rearranging the inequality yields the desired statement. <4 

Note that the bound for unfinished jobs in Proposition [8] has an additional factor > 1 
compared to the one for finished jobs. However, for large enough Xj this factor becomes 
nearly one. Thus, we will apply this bound only in cases of large Xj. If Xj is relatively small, 
we will instead bound E^(j) only by its value. We continue by describing the different types 
of jobs we consider. In total, we differentiate between three job categories: 

Finished Jobs These are all jobs j with yj = 1 (i.e., jobs finished by PD). As mentioned 
above, we bound both components E^(j) and Xj of g(X) by the actual energy consumption 
of PD. We use J\ := {j G J \ yj = 1 } to refer to this job category. 

Unfinished, Low-yield Jobs We use the term low-yield jobs to refer to jobs not finished by 
PD and which have a relatively small Xj. That is, jobs of which the optimal infeasible 
solution does not schedule too large a portion. Intuitively, the value of such jobs must 
be small, because otherwise it would have been beneficial to schedule a larger portion 
of them in the optimal infeasible solution. In this sense, these jobs are low-yield and 
we will exploit this fact by bounding both components E^(j) and Xj of g(X) by the job 
value PD is charged for not finishing j. More formally, this job category is defined as 
J 2 -={jeJ\y j =0Ax j <^l}. 

Unfinished, High-yield Jobs Correspondingly, the term high-yield jobs refers to jobs finished 
by PD and which have a relatively large Xj. More exactly, these jobs are given by 

1 — a 

Js := { j G J | yj — A Xj > a ~ c ^ 1 — }• This proves to be the most challenging case, as 
neither do the jobs feature a particularly small value nor does PD invest any energy 
into their execution. Instead, we use a mix of the job's value and the energy spent by 
PD during j's trace to account for its contribution. One has to carefully balance what 
portions of E^(j) and Xj to bound by either Ep^(j) or by Vj. 

In accordance with these job categories, we split the value of the dual function by the 
corresponding contributions. That is, g(X) = Yli=i 9iWi where gi(X) = (1 — a) ^2j e j. E^(j) + 
^2jeJi ^ ne following lemmas bound each contribution separately. 

► Lemma 9 (Finished Jobs). gi(X) > SE PB + (1 - a)S^ J2jeJi ^pd(j)- 

Proof. We have gi(X) = (1 — a) J2jeJ t ^\U) + ^jieJ ^i- Using Proposition £ 'b) and a > 1 
we bound the first summand by (1 — a) YjjeJi ^pd(j)- For the second summand, we get 

N N 

Y ^ = Y Y*^ = Y ^^' fc(5 i^^ ifc '---'^ fe ' '---' ) 

jeJi jeJi k=i jeJi k=i 

N N 

= sy2 y^£jfc^^(£ifc,...,£jfc,o, . ..,o) > ^Vp fc (%,...,y = se fd . 

jk 

k=i jeJ k=i 
The involved inequality is based on the fact that for any different iable convex function 
/: R n R with /(0) = and x G R^ we have YTj=i x j §f\ x ^^ ■ ■ • > x j> °> • ■ • > °) > f( x ) 
(see, e.g., |8, Chapter 3]). Together the bounds yield the lemma's statement. <4 
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► Lemma 10 (Low-yield Jobs). #2 (A) > a a X^ej 2 Vj. 



Proof. Proposition £ 'a) together with the fact that Xj = Vj for j G J2 yields E^(j) = Vj 



Applying this to g^ (A) we get 

g 2 (\) = £ (1 - + E ^ = E + E ^ = E f 1 

o'c 7„ o'c 7~ j'G J 2 ""^ *~ 1' 7„ \ 



a — 1 



J'GJ 2 

>£(! 



iGJ 2 



J'GJ 2 jGJ 2 



^2 



J'GJ 2 



a — a 



v i = a ° E 

J'GJ 2 



Lemma 11 (High-yield Jobs). g 3 (\) > ^ E jG J 3 E Mj) + a"" E iG j 3 ^ ^r- 



Proof. We make use of both Proposition £|[a) and Proposition £|[c)| First note that the 
prerequisite S < -^=1 together with a > 1 and j G J3 gives us the relation S^ 1 < ^ < 1 < 
< Xj. This allows us to apply Proposition £ V) The second summand of 03(A) is 



split into two parts, one of which is accounted for by energy invested by PD and the other 
one by lost value due to unfinished jobs: 



9s 



jeJ 3 



jeJ 3 



£ (1 - a)E- x (j) + E (1 " a ~ a ) *i + E ""^i 



jGJ 3 



J'GJ 3 



= ^(l-a)^(j)+E( 1 -^" 



J'GJ 3 



jGJ 3 



J'GJ 3 

jeJ 3 



a — a 



= 53(l-a)%(7)(l 

JGJ 3 V 

> ^(l-a)(S^£7p D C;) ( 1 



l-a 



(a - 1)% 



E«- a 

J'GJ 3 



^7 



j'GJs 



a — a 



l-a 



> 



(a - 1)% 

E a - f 1 - 7 1 - + E a ~ a 



E «~ a »; 

j'GJs 



j'GJs 



JGJ 3 



> (1 - a)a" a ^ £ PD (j) + E a_C V 



j'GJs 



j'GJs 



The first inequality applies Proposition ^'c) the penultimate inequality the relations deduced 
from the prerequisite, and the last inequality is the application of Bernoulli's inequality. < 



4.4 Deriving the Tight Competitive Ratio 

It remains to derive our final upper bound on the competitive ratio of PD. We do so by 
combining the bounds from Lemma [9j Lemma [ToJ and Lemma 11 



► Theorem 3. The competitive ratio of algorithm PD with the parameter S set to ^r=r is 
at most a a . Moreover, there is a problem instance for which PD is exactly by a factor of a a 
worse than an optimal algorithm. That is, our upper bound is optimal. 
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Proof. If we combine the results from Lemma [9] to Lemma 11 we get 



g(X) > a x - a E m + (1 - a)a~ a ^£ PD (j) + a~ a J2 V J 

jeJiUJs jeJ 2 uJ 3 

> a^Epv + (1 - a)a~ a ^ E PD (j) + ^ 

j'gj jeJ 2 uJ 3 

> (a 1 "" + (1 - £ PD + ^ ^ = cost(PD). 

jeJ 2 uJ 3 

Now, let OPT denote an optimal schedule for the current problem instance. Moreover, let 
OPT 7 denote an optimal solution to the relaxed mathematical program (CP). Obviously, 
it holds that cost(OPT / ) < cost(OPT). By duality, we know that g(X) < cost(OPT / ). By 
combining these inequalities we can bound PD's competitiveness by 

cost(PD) < a a g(X) < a a cost(OPT / ) < a a cost(OPT). 

For the lower bound, consider a single processor and assume the job values are high 
enough to ensure that PD finishes all jobs. We create a job instance of n jobs in the same 
way as done in 3 for the lower bound on OA and AVR. That is, job j G J — { 1, 2, . . . , n } 
arrives at time j — 1 and has workload (n — j + All jobs have the same deadline n. 

Now, whenever one of the jobs arrives, PD schedules all remaining jobs at the energy-optimal 
(i.e., minimal) speed as pool jobs. In other words, it computes a schedule that is optimal for 
the remaining known work. This is exactly what OA does (hence its name), which means 
that we get the same lower bound of a a as for OA (cf. [3l Lemma 3.2]). A 



5 Conclusion 



We presented a new algorithm and an analysis based on duality theory for scheduling valuable 
jobs on multiple speed-scalable processors. Using duality theory to approach the analysis of 
energy-efficient scheduling algorithms was recently proposed by Gupta, Krishnaswamy, and 
Pruhs 12 . Given that the first formal proof of the original offline algorithm's optimality was 
achieved by means of duality theory using the KKT conditions [2], it seems that this is a 
very natural way to approach this kind of problems. However, almost all results for online 
algorithms in this area use amortized competitiveness arguments similar to the original proof 
of OA's competitiveness, one of the first and most important online algorithms for energy- 
efficient scheduling. While this approach proved to be elegant and very powerful, designing 
suitable potential functions is difficult and needs a quite high amount of experience with 
the topic. Adapting these potential functions to new model variations and generalizations, 
or tuning them to narrow the gap to the known lower bounds is non-trivial and remains a 
challenging task. We think that using well-developed utilities from duality theory for convex 
programming may prove to be a worthwhile and promising alternative approach. Our results 
underline this conjecture, not only improving upon known results proved using the classical 
method but also generalizing them to the important case of multiple processors. 
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